Combining Rule-based and Data-driven Techniques for Grammatical Relation Extraction in Spoken Language
نویسندگان
چکیده
We investigate an aspect of the relationship between parsing and corpus-based methods in NLP that has received relatively little attention: coverage augmentation in rule-based parsers. In the specific task of determining grammatical relations (such as subjects and objects) in transcribed spoken language, we show that a combination of rule-based and corpus-based approaches, where a rule-based system is used as the teacher (or an automatic data annotator) to a corpus-based system, outperforms either system in isolation.
منابع مشابه
Parsing of Grammatical Relations for Databases of Spoken Language
Despite the significant advances in syntactic parsing of written text, the application of these techniques to spontaneous spoken language has received more limited attention. The explosive growth of available corpora of transcribed spoken language opens up new opportunities in that direction. High accuracy parsers for spoken language will in turn provide a platform for development of a wide ran...
متن کاملParsing of Grammatical Relations in Transcripts of Parent-Child Dialogs Thesis Summary
Automatic analysis of syntax is one of the core problems in natural language processing. Despite significant advances in syntactic parsing of written text, the application of these techniques to spontaneous spoken language has received more limited attention. The recent explosive growth of online, accessible corpora of spoken language interactions opens up new opportunities for the development ...
متن کاملA Multi-Strategy Approach for Parsing of Grammatical Relations in Transcripts of Parent-Child Dialogs
Automatic analysis of syntax is one of the core problems in natural language processing. Despite significant advances in syntactic parsing of written text, the application of these techniques to spontaneous spoken language has received more limited attention. The recent explosive growth of online, accessible corpora of spoken language interactions opens up new opportunities for the development ...
متن کاملApplication of the rule extraction method to evaluate seismicity of Iran
Assessing seismic hazards involves specifying the likelihood, magnitude and location of earthquakes in a region. Predicting the seismic hazards is the first step in reducing the impact of the damage caused by an earthquake. In this study, to fully utilize all the known parameters which may possibly affect the occurrence of earthquakes (mb ≥ 4.5); a data-driven rule-extraction method called the...
متن کاملتبدیل خودکار درختبانک وابستگی فارسی به درختبانک سازهای
There are two major types of treebanks: dependency-based and constituency-based. Both of them have applications in natural language processing and computational linguistics. Several dependency treebanks have been developed for Persian. However, there is no available big size constituency treebank for this language. In this paper, we aim to propose an algorithm for automatic conversion of a depe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003